Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

نویسندگان

چکیده

We address the challenging problem of Natural Language Comprehension beyond plain-text documents by introducing TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics. Contrary to previous approaches, we rely on a decoder capable unifying variety problems involving natural language. The is represented as an attention bias complemented with contextualized while core our model pretrained encoder-decoder Transformer. Our novel approach achieves state-of-the-art results in extracting information from answering questions demand understanding (DocVQA, CORD, SROIE). At same time, simplify process employing end-to-end model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document image understanding: geometric and logical layout

Document Image Understanding encompasses the technology required to make paper documents equivalent to other computer exchange media like oppies, tapes, and cdroms. The physical reader of the paper document is the scanner just like the physical reader of the oppy is the oppy drive and the physical reader of the tape cartridge is the tape cartridge drive, and the physical reader of the cdrom is ...

متن کامل

Geometric Layout Analysis Techniques for Document Image Understanding: a Review

Document Image Understanding (DIU) is an interesting research area with a large variety of challenging applications. Researchers have worked from decades on this topic, as witnessed by the scientific literature. The main purpose of the present report is to describe the current status of DIU with particular attention to two subprocesses: document skew angle estimation and page decomposition. Sev...

متن کامل

Integrated Text and Image Understanding for Document Understanding

Because of the complexity of documents and the variety of applications which must be supported, document understanding requires the integration of image understanding with text understanding. Our docum(,nt understanding technology is implemented in a system called IDUS (Intelligent Document Undcrstanding System), which creates the da ta for a text retrieval application and the automatic generat...

متن کامل

Document Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)

Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...

متن کامل

Document Image Layout Comparison and Classification

This paper describes features and methods for document image comparison and classification at the spatial layout level. The methods are useful for visual similarity based document retrieval as well as fast algorithms for initial document type classification without OCR. A novel feature set called interval encoding is introduced to capture elements of spatial layout. This feature set encodes reg...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2021

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-86331-9_47